Goto

Collaborating Authors

 competitive learning


Sparse Mixture of Experts as Unified Competitive Learning

arXiv.org Artificial Intelligence

Sparse Mixture of Experts (SMoE) improves the efficiency of large language model training by directing input tokens to a subset of experts. Despite its success in generation tasks, its generalization ability remains an open question. In this paper, we demonstrate that current SMoEs, which fall into two categories: (1) Token Choice ;and (2) Expert Choice, struggle with tasks such as the Massive Text Embedding Benchmark (MTEB). By analyzing their mechanism through the lens of competitive learning, our study finds that the Token Choice approach may overly focus on irrelevant experts, while the Expert Choice approach risks discarding important tokens, potentially affecting performance. Motivated by this analysis, we propose Unified Competitive Learning SMoE (USMoE), a novel and efficient framework designed to improve the performance of existing SMoEs in both scenarios: with and without training. Extensive experiments across various tasks show that USMoE achieves up to a 10% improvement over traditional approaches or reduces computational inference costs by 14% while maintaining strong performance.


Gradient-based Competitive Learning: Theory

arXiv.org Machine Learning

Deep learning has been widely used for supervised learning and classification/regression problems. Recently, a novel area of research has applied this paradigm to unsupervised tasks; indeed, a gradient-based approach extracts, efficiently and autonomously, the relevant features for handling input data. However, state-of-the-art techniques focus mostly on algorithmic efficiency and accuracy rather than mimic the input manifold. On the contrary, competitive learning is a powerful tool for replicating the input distribution topology. This paper introduces a novel perspective in this area by combining these two techniques: unsupervised gradient-based and competitive learning. The theory is based on the intuition that neural networks are able to learn topological structures by working directly on the transpose of the input matrix. At this purpose, the vanilla competitive layer and its dual are presented. The former is just an adaptation of a standard competitive layer for deep clustering, while the latter is trained on the transposed matrix. Their equivalence is extensively proven both theoretically and experimentally. However, the dual layer is better suited for handling very high-dimensional datasets. The proposed approach has a great potential as it can be generalized to a vast selection of topological learning tasks, such as non-stationary and hierarchical clustering; furthermore, it can also be integrated within more complex architectures such as autoencoders and generative adversarial networks.


Topological Gradient-based Competitive Learning

arXiv.org Machine Learning

Topological learning is a wide research area aiming at uncovering the mutual spatial relationships between the elements of a set. Some of the most common and oldest approaches involve the use of unsupervised competitive neural networks. However, these methods are not based on gradient optimization which has been proven to provide striking results in feature extraction also in unsupervised learning. Unfortunately, by focusing mostly on algorithmic efficiency and accuracy, deep clustering techniques are composed of overly complex feature extractors, while using trivial algorithms in their top layer. The aim of this work is to present a novel comprehensive theory aspiring at bridging competitive learning with gradient-based learning, thus allowing the use of extremely powerful deep neural networks for feature extraction and projection combined with the remarkable flexibility and expressiveness of competitive learning. In this paper we fully demonstrate the theoretical equivalence of two novel gradient-based competitive layers. Preliminary experiments show how the dual approach, trained on the transpose of the input matrix i.e. $X^T$, lead to faster convergence rate and higher training accuracy both in low and high-dimensional scenarios.


Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs

arXiv.org Machine Learning

In this study, we propose the integration of competitive learning into convolutional neural networks (CNNs) to improve the representation learning and efficiency of fine-tuning. Conventional CNNs use back propagation learning, and it enables powerful representation learning by a discrimination task. However, it requires huge amount of labeled data, and acquisition of labeled data is much harder than that of unlabeled data. Thus, efficient use of unlabeled data is getting crucial for DNNs. To address the problem, we introduce unsupervised competitive learning into the convolutional layer, and utilize unlabeled data for effective representation learning. The results of validation experiments using a toy model demonstrated that strong representation learning effectively extracted bases of images into convolutional filters using unlabeled data, and accelerated the speed of the fine-tuning of subsequent supervised back propagation learning. The leverage was more apparent when the number of filters was sufficiently large, and, in such a case, the error rate steeply decreased in the initial phase of fine-tuning. Thus, the proposed method enlarged the number of filters in CNNs, and enabled a more detailed and generalized representation. It could provide a possibility of not only deep but broad neural networks.


Biologically Inspired Feedforward Supervised Learning for Deep Self-Organizing Map Networks

arXiv.org Machine Learning

In this study, we propose a novel deep neural network and its supervised learning method that uses a feedforward supervisory signal. The method is inspired by the human visual system and performs human-like association-based learning without any backward error propagation. The feedforward supervisory signal that produces the correct result is preceded by the target signal and associates its confirmed label with the classification result of the target signal. It effectively uses a large amount of information from the feedforward signal, and forms a continuous and rich learning representation. The method is validated using visual recognition tasks on the MNIST handwritten dataset.


Competitive Learning with Feedforward Supervisory Signal for Pre-trained Multilayered Networks

arXiv.org Machine Learning

We propose a novel learning method for multilayered neural networks which uses feedforward supervisory signal and associates classification of a new input with that of pre-trained input. The proposed method effectively uses rich input information in the earlier layer for robust leaning and revising internal representation in a multilayer neural network.


A Silicon Primitive for Competitive Learning

Neural Information Processing Systems

Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11-transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs a similarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task. 1 Introduction Competitive learning is a family of neural learning algorithms that has proved useful for training many classification and clustering networks [1]. In these networks, a neuron's synaptic weight vector typically represents a tight cluster of data points.


A Silicon Primitive for Competitive Learning

Neural Information Processing Systems

Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11-transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs a similarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task. 1 Introduction Competitive learning is a family of neural learning algorithms that has proved useful for training many classification and clustering networks [1]. In these networks, a neuron's synaptic weight vector typically represents a tight cluster of data points.


A Silicon Primitive for Competitive Learning

Neural Information Processing Systems

Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11-transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs asimilarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task. 1 Introduction Competitive learning is a family of neural learning algorithms that has proved useful fortraining many classification and clustering networks [1]. In these networks, a neuron's synaptic weight vector typically represents a tight cluster of data points.


Task and Spatial Frequency Effects on Face Specialization

Neural Information Processing Systems

There is strong evidence that face processing is localized in the brain. The double dissociation between prosopagnosia, a face recognition deficit occurring after brain damage, and visual object agnosia, difficulty recognizing otber kinds of complex objects, indicates tbat face and nonface objectrecognition may be served by partially independent mechanisms inthe brain. Is neural specialization innate or learned? We suggest that this specialization could be tbe result of a competitive learning mechanism that, during development, devotes neural resources to the tasks they are best at performing. Furtber, we suggest that the specialization arisesas an interaction between task requirements and developmental constraints. In this paper, we present a feed-forward computational model of visual processing, in which two modules compete to classify input stimuli. When one module receives low spatial frequency information andthe other receives high spatial frequency information, and the task is to identify the faces while simply classifying the objects, the low frequency network shows a strong specialization for faces. No otber combination of tasks and inputs shows this strong specialization. We take these results as support for the idea that an innately-specified face processing module is unnecessary.